Audio: STFT Process: Add Xtensa HiFi function versions#10638
Audio: STFT Process: Add Xtensa HiFi function versions#10638singalsu wants to merge 2 commits intothesofproject:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds HiFi3 SIMD implementations for STFT hot-path helpers and refactors shared, non-SIMD-specific routines into a common compilation unit to reduce MCPS.
Changes:
- Add HiFi3 intrinsic implementations of
stft_process_apply_window()andstft_process_overlap_add_ifft_buffer(). - Move source/sink and buffer-fill helper functions from
stft_process-generic.cintostft_process_common.c. - Introduce Kconfig SIMD level selection and update build sources to include the HiFi3 unit.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/audio/stft_process/stft_process_common.c | Adds shared source/sink and FFT buffer fill helpers (moved from generic). |
| src/audio/stft_process/stft_process-hifi3.c | New HiFi3 intrinsic implementations for windowing + overlap-add. |
| src/audio/stft_process/stft_process-generic.c | Removes moved helpers; wraps generic implementations behind SOF_USE_HIFI(NONE, ...). |
| src/audio/stft_process/Kconfig.simd | Adds Kconfig choice for SIMD optimization level selection. |
| src/audio/stft_process/Kconfig | Includes the new SIMD Kconfig via rsource. |
| src/audio/stft_process/CMakeLists.txt | Adds the HiFi3 compilation unit to the build. |
Comments suppressed due to low confidence (1)
src/audio/stft_process/stft_process-hifi3.c:1
- The function relies on 64-bit alignment and even-sample constraints but does not enforce either at runtime. Misalignment can cause load/store exceptions or significant penalties depending on the core/config, and “even samples” is already required to avoid the
>> 1infinite-loop hazard. Add an explicit alignment/size assertion (or a guarded scalar fallback when(uintptr_t)obuf->w_ptris not 8-byte aligned or when the contiguous region before wrap is odd-length) to make failures deterministic and easier to diagnose.
// SPDX-License-Identifier: BSD-3-Clause
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a543f3b to
33e8ab8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
33e8ab8 to
a8b7178
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a8b7178 to
8b1e458
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
8b1e458 to
68005ba
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This patch adds to stft_process-hifi3.c the HiFi3 versions of higher complexity functions stft_process_apply_window() and stft_process_overlap_add_ifft_buffer(). The functions with no clear HiFi optimization benefit are moved from stft_process-generic.c to stft_process_common.c. Those functions move data with practically no processing to samples. The stft_process_setup() function is changed to allocate buffers with mod_balloc_align() to ensure a 32-bit sample pair or complex number is aligned for 64 bit xtensa SIMD. This patch also adds checks to other parameters to ensure the STFT is set up in a way that can be executed. The patch also fixes a too large allocation in setup. The window function buffer allocation is common for all channels. It should not be multiplied by channels count. This change saves 17 MCPS (from 63 MCPS to 46 MCPS). The test was done with script run: scripts/rebuild-testbench.sh -p mtl scripts/sof-testbench-helper.sh -x -m stft_process_1024_256_ \ -p profile-stft_process.txt The above STFT used FFT length 1024 with hop 256. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch removes fill_start_idx member from struct stft_process_fft. It would have required another check for data align and samples amount for Xtensa HIFI SIMD code version. There is no need for different FFT padding types (left, center, right as in MFCC) in this component, so it's safe to remove. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
68005ba to
f51925c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This patch adds to stft_process-hifi3.c the HiFi3 versions of higher complexity functions stft_process_apply_window() and stft_process_overlap_add_ifft_buffer().
The functions with no clear HiFi optimization benefit are moved from stft_process-generic.c to stft_process_common.c. Those functions move data with practically no processing to samples.
This change saves 17 MCPS (from 63 MCPS to 46 MCPS). The test was done with script run:
scripts/rebuild-testbench.sh -p mtl
scripts/sof-testbench-helper.sh -x -m stft_process_1024_256_
-p profile-stft_process.txt
The above STFT used FFT length 1024 with hop 256.